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ABSTRACT 

Motivation: Networks are widely used as structural summaries of bio- 
chemical systems. Statistical estimation of networks is usually based 
on linear or discrete models. However, the dynamics of biochemical 
systems are generally non-linear, suggesting that suitable non-linear 
formulations may offer gains with respect to causal network inference 
and aid in associated prediction problems. 

Results: We present a general framework for network inference and 
dynamical prediction using time course data that is rooted in non- 
linear biochemical kinetics. This is achieved by considering a dynam- 
ical system based on a chemical reaction graph with associated 
kinetic parameters. Both the graph and kinetic parameters are treated 
as unknown; inference is carried out within a Bayesian framework. 
This allows prediction of dynamical behavior even when the underlying 
reaction graph itself is unknown or uncertain. Results, based on (i) data 
simulated from a mechanistic model of mitogen-activated protein 
kinase signaling and (ii) phosphoproteomic data from cancer cell 
lines, demonstrate that non-linear formulations can yield gains in 
causal network inference and permit dynamical prediction and uncer- 
tainty quantification in the challenging setting where the reaction graph 
is unknown. 

Availability and implementation: MATLAB R2014a software is avail- 
able to download from warwick.ac.uk/chrisoates. 
Contact: c.oates@warwick.ac.uk or sach@mrc-bsu.cam.ac.uk 
Supplementary information: Supplementary data are available at 
Bioinformatics online. 



1 INTRODUCTION 

Statistical network inference techniques are widely used in the 
analysis of multivariate biochemical data (Ellis and Wong, 2008; 
Sachs et aL, 2005). These techniques aim to make inferences re- 
garding a network N whose vertices are identified with biomole- 
cular components (e.g. genes or proteins) and edges with (direct 
or indirect) regulatory interplay between those components. 

Network inference methods are typically rooted in linear or 
discrete models whose statistical and computational advantages 
facilitate exploration of large spaces of networks (e.g. Ellis and 
Wong, 2008; Maathuis et aL, 2009; WerhH et aL, 2006). On the 
other hand, when the network topology is known, non-linear 
ordinary differential equations (ODEs) are widely used to 
model biochemical dynamics (Chen et aL, 2009; Kholodenko, 
2006). The intermediate case where ODE models are used to 
select between candidate networks has received less attention. 
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We propose a general framework called 'Chemical Model 
Averaging' (CheMA) that uses biochemical ODE models to 
carry out both network inference and dynamical prediction. In 
summary, we consider a dynamical system dX/dt=fQ(X, 0) 
where the state vector X contains the abundances of molecular 
species, G is a chemical reaction graph that characterizes reac- 
tions in the system, /^^ is a kinetic model that depends on G, and 0 
collects together all unknown kinetic parameters. A causal net- 
work N is obtained as a coarse summary N(G) of the reaction 
graph G in which each chemical species appears as a single node, 
and directed edges indicate that the parent is involved in chem- 
ical reaction(s), which have the child as product (we make these 
notions precise below). Given time course data V consisting of 
noisy measurements of X, we carry out inference and prediction 
within a Bayesian framework. In particular, we treat G itself as 
unknown and make inference concerning it using the posterior 
distribution, 

p(G\V) a p(G) [ p(V\e, G)p{e\G)de (1) 



marginal likelihood p{T>\G) 

where the marginal likelihood piV\G) captures how well the 
chemical reaction graph G describes data V, taking into account 
both parameter uncertainty and model complexity and p{0\G) is 
a prior density over the kinetic parameters. In contrast to linear 
or discrete models that are motivated by tractability, our likeli- 
hood p{V\0, G) depends on (richer) reaction graphs G and their 
associated kinetics. 

This article makes three contributions: (i) A general frame- 
work for joint network learning and dynamical prediction 
using ODE models, (ii) a specific implementation ('CheMA 
1.0'), rooted in Michaehs-Menten kinetics, that uses 
Metropolis-within-Gibbs sampling to allow Bayesian inference 
at feasible computational cost and (iii) an empirical investigation, 
using both simulated and experimental time course data, of the 
performance of CheMA 1.0 relative to several existing 
approaches for network inference and dynamical prediction. 

The statistical connection between linear ODEs and network 
inference using linear models has been discussed in Oates and 
Mukherjee (2012) and exploited in Bansal et aL (2006), Gardner 
et aL (2003). Several approaches based on non-linear ODEs have 
been proposed, including Aijo and Lahdesmaki (2010); Honkela 
et aL (2010); Nachman et aL (2004); Nelander et aL (2008). This 
article extends these ideas by formulating a Bayesian approach to 
both network inference and dynamical prediction that is rooted 
in chemical kinetics. Bayesian model selection based on non- 
linear ODEs has been shown to be a promising strategy for 
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elucidation of specific signaling mechanisms (e.g. Xu et ai, 2010). 
Our work differs in motivation and approach in that we exploit 
automatically generated rather than hand-crafted biochemical 
models, thereby allowing full network inference without 
manual specification of candidate models. Oates et al. (2012) 
performed Bayesian model selection by comparing steady- state 
data with equilibrium solutions of automatically generated ODE 
models. This article extends this approach to time course data 
and prediction of dynamics. 

There are several considerations that motivate CheMA: (i) 
inference in biological systems is complicated by correlations be- 
tween components that are co-regulated but not causally linked. 
It is well known that, under a linear formulation, the causal 
network TV is in general unidentifiable (Pearl, 2009). For ex- 
ample, it may not be possible to orient certain edges, or edges 
may be inferred between co-regulated nodes due to strong asso- 
ciations between them. Non-linear kinetic equations, in contrast, 
are able to confer asymmetries between nodes and may be suffi- 
cient to enable orientation of edges (Peters et al, 201 1), although 
we note that causal inference using non-linear models still 
requires a number of strong assumptions (Pearl, 2009). As a 
consequence, CheMA can in principle aid in causal network in- 
ference, and empirical results below support this, (ii) In contrast 
to linear models, in CheMA, the mechanistic roles of individual 
variables are respected. This facilitates analysis of data obtained 
under specific molecular interventions and enhances scientific 
interpretability. (iii) Prediction of dynamical behavior (e.g. re- 
sponse to a stimulus or to a drug treatment) in general depends 
on the chemical reaction graph. In settings where the graph itself 
is unknown or uncertain (e.g. due to genetic or epigenetic con- 
text), CheMA allows prediction of dynamics by averaging over 
an ensemble of (automatically generated) candidate reaction 
graphs. 

The CheMA framework is general and can in principle be used 
in many settings where kinetic formulations are available to de- 
scribe the dynamics, including gene regulation, metabolism and 
protein signaling. For definiteness, in this article, we focus on 
protein signaling networks mediated by phosphorylation and 
provide a specific implementation of the general framework. 
Phosphorylation kinetics have been widely studied (Kholodenko, 
2006), and ODE formulations are available, including those 
based on Michaelis-Menten kinetics (Leskovac, 2003). 

The remainder of the article is organized as follows. First, we 
introduce the model and associated statistical formulation. 
Second, we discuss network inference and dynamical prediction 
within this framework. Third, we show empirical results, on 
simulated and experimental data, comparing CheMA 1.0 with 
several existing approaches. Finally, we discuss our findings and 
suggest several directions for further work. 

2 METHODS 

Below we describe a first implementation of the CheMA framework, 
called CheMA 1.0, for the specific context of protein phosphorylation 
networks. Figure 1 provides an outline of the workflow below. 

2.1 Automatic generation of reaction graphs G 

We construct reaction graphs for p proteins {X\ . . .Xp] = V. Each Xj can 
be phosphorylated to X*; the set of phosphorylated proteins is V*. 




Automatic Generation 
of Reaction Graphs G 



X = /g(X,6>) 

fC _ VoC* 
Jg — C*+Ko 
VaA*C 



Automatic Generation 
of Kinetic Models fc 



Interventions on 
the System 




Prediction of 
Dynamic Response 



Fig. 1. CheMA. Chemical reaction graphs G summarize interplay that is 
described quantitatively by kinetic equations /q. Candidate graphs G are 
scored against observed time course data D in a Bayesian framework. A 
network gives a coarse summary of the system; marginal posterior 
probabilities of edges in quantify evidence in favor of causal relation- 
ships. The reaction graph G (and N) is treated as an unknown, latent 
object and the methodology allows Bayesian prediction of dynamics 
(including under intervention) in the unknown graph setting 



Phosphorylation reactions Xi X* are catalyzed by enzymes E e Ef, 
the subscript indicates that each protein may have a specific set of en- 
zymes (kinases). We consider the case in which the kinases themselves are 
phosphorylated proteins, i.e. Sf c V* (if phosphorylation of Xi is not 
driven by an enzyme in V*, we set Si = 0). For simplicity, we do not 
consider multiple phosphorylation sites, other post-translational modifi- 
cations (e.g. ubiquitinylation), protein degradation or spatial effects. The 
ability of enzyme E e Sj to catalyze phosphorylation of Xj may be in- 
hibited by proteins / e Xi^e ^ V*; the double subscript indicates that in- 
hibition is specific to both target Xi and enzyme E (see below). 

The reaction graph G provides a visual representation of the sets 
Si and Xi^E', Figure 1 shows an illustrative example on three proteins A, 
B and C. A causal biological network N{G) is formed by drawing 
exactly p vertices and edges (/, j) indicating that X* is either an enzyme 
catalyzing phosphorylation of Xp or an inhibitor of such an enzyme. 
That is, e N <^ i e Sj v 3E ■ i e Xj^e- For the example shown in 
Figure 1, the corresponding network N is the directed graph 
C ^ B. 



2.2 Automatic generation of kinetic models fQ 

The reaction graph G can be decomposed into local graphs describing 
enzymes (and their inhibitors) for phosphorylation of protein Xi. For 
simplicity of exposition, we consider inference concerning G,. Thus, Xi 
plays the role of the substrate; following conventional notation in enzyme 
kinetics, we refer to Xi using the symbol S and use [•] to denote concen- 
tration of the chemical species indicated by the argument. 

We use kinetic models based on Michaelis-Menten functionals 
(Leskovac, 2003). Here we restrict attention to a relatively simple model 
class, but more complex dynamics could be incorporated. The rate of 
phosphorylation due to kinase E is given by Ve[E\[S^' /{[Si' + K),), 
which acknowledges variation of kinase concentration [£] and permits 
kinase-specific response profiles, parameterized by Ke and h, with rate 
constant Ve- Below, the Hill coefficient h is taken equal to 1 (non-coopera- 
tive binding). We consider competitive inhibition, where substrate and 
inhibitor / compete for the same binding site on the enzyme 
(EI ^ E ^ ES ^ E+ S*). When multiple inhibitors are present, they 
are assumed to act exclusively, competing for the same binding site on 
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the enzyme (EI ^ E ^ ET), corresponding mathematically to a rescaling 
of the MichaeHs-Menten parameter Ke{\ + We do 

not model phosphatase specificity; in particular, dephosphorylation is 
assumed to occur at a rate Ko[aS'*]/([aS'*] + A^o), depending on a 
Michaelis-Menten parameter Kq and taking a maximal value Vq. 

Combining these assumptions produces a kinetic model for phosphor- 
ylation of substrate S, given by fo^si^, ^s) = 



Vo[S*] 



-E 

Ee£s 



Ve[E][S] 



[S] + Ke[1 



(2) 



where the parameter vector 6s contains the maximum rates V and 
Michaelis-Menten constants K, and the (local) graph Gs specifies the 
sets Ss and Ts,e- The complete dynamical system fg is given by taking, 
for each species 5 g V, a model akin to Equation (2). In this way, we are 
able to automate the generation of candidate parametric ODE models. 

2.3 Model averaging and the network 

Evidence for a causal influence of protein / on protein j is summarized by 
the marginal posterior probability of a directed edge (/, j) in the network 
A^. This is obtained by averaging over all possible reaction graphs G, as 



p{{iJ)eN\V)- 



J2 P(P\G)P{G) 

G-.ieGj 

Y,P(P\G)P{G) ' 



(3) 



We note that while the marginal posterior in Equation (3) is an intuitive 
summary, the full posterior over reaction graphs G is also available 
for more detailed exploration. In the same vein, model averaging is 
used to compute posterior predictive distributions (see Supplementary 
Material). 

Following work in structural inference for graphical models (Ellis and 
Wong, 2008), we bound graph in-degree; in particular, we consider only 
those sets of kinases Ss ^ V* that satisfy l^^^l < ci, and similarly, we 
bound the number of inhibitors IX^^^I < C2 (see Section 3.1 below). 

Bayesian variable selection requires multiplicity correction to control 
the false discovery rate and avoid degeneracy (Scott and Berger, 2010). 
For phosphorylation networks, we achieve multiplicity correction using a 
prior p{G) uniform over the number of kinases, and for a given kinase, 
uniform over the number of kinase inhibitors: 



n 



p 

\^ue\ 



(4) 



We note that the above prior does not include biological knowledge 
concerning specific edges; informative structural priors are also available 
in the literature (Mukherjee and Speed, 2008). 

2.4 Statistical formulation: CheMA 1.0 

2.4.1 Time course data Data V comprise measurements yi{tj) and j^* 
{tj) proportional to the concentrations of unphosphorylated and phos- 
phorylated forms, respectively, of protein / at discrete times tp 0 <j < n. 
Data are scale normalized to give unit mean for each protein 
(JljyMj)= = !)■ In CheMA 1.0, observables are related to 

dynamics via 'gradient matching'. We follow Aijo and Lahdesmaki 
(2010); Bansal et al. (2006); Gates and Mukherjee (2012) and use a 
simple Euler scheme that approximates the gradient dXi/dt at time tj 
by z,(/y) = (y*(/y) — y1{tj-\y)/{tj — tj-\). We note that more accurate ap- 
proximations could be used, at the cost of requiring more data points or 
additional modeling assumptions (see Section 4). The ODE model fG,s 
[Equation (2)] is formulated as a statistical model by constructing, con- 
ditional upon (unknown) Michaelis-Menten parameters K, a design 



matrix Dg,s{K) with rows 



ys 



ylys 



(5) 



and then interpreting Equation (2) statistically as 

zs = Dg,s{K) V+ 6, e-N{0, a^I) (6) 

where zs ^ [^^(^i), ■ ■ ■ , zs(tn)]^, Af denotes a normal density, the noise 
variance, / the identity matrix and, as above, V is the vector of maximum 
reaction rates. The appropriateness of normality, additivity and the 
uncorrelatedness of errors necessarily depends on the data generating 
and measurement processes, as well as the time intervals tj — tj-\ between 
consecutive observations, as discussed in Oates and Mukherjee (2012). 
This approximation has the crucial advantage of rendering the local re- 
action graphs Gs statistically orthogonal, such that each may be esti- 
mated independently (see Hill et al, 2012). Iterating over S eV permits 
inference concerning the complete reaction graph G. 

2.4.2 MCMC and marginal likelihoods CheMA 1.0 uses truncated 
normal priors AfrifJ^, S) with parameters /x, % inherited from the corres- 
ponding untruncated distribution. Truncation ensures non-negativity of 
parameters, while normality faciUtates partial conjugacy (see below); add- 
itional information on truncated normals is provided in the 
Supplementary Material. To simplify notation, we consider a specific 
variable S and candidate model Gs and omit the subscript in what fol- 
lows. To elicit hyperparameters /x, we follow Xu et al. (2010) and 
assume all processes occur on observable time and concentration scales, 
that is ixy, /[i^~(9(l), reflecting that the data are normalized a priori. 
For prior covariance of MichaeHs-Menten parameters %k, we assume 
independence of the components Kj, so that p{K)=Nt{K\ ii]^,vl), 
where /x^, v are hyperparameters. For the prior covariance S f of max- 
imum reaction rates, we take a unit information formulation of the trun- 
cated ^-prior, so that p(V\K, a)=AfT(V; fiy, na^(D'D)~^) where 
D = Dg,s{K) is the design matrix defined above. This implies that the 
prior contributes (approximately) the same amount of information as 
one data point, as recommended by Kass and Wasserman (1995), and 
automatically selects the scale of the prior covariance (see Zellner, 1986). 
For the noise parameter, we use a Jeffreys prior p{a) oc 1 /a. These latter 
choices render the formulation partially conjugate, permitting an efficient 
MetropoHs-within-Gibbs Markov chain Monte Carlo (MCMC) sampling 
scheme for the parameter posterior distribution, as described in detail in 
the Supplementary Material. 

To estimate marginal likelihoods from sampler output, we exploit par- 
tial conjugacy and use the method of Chib and Jeliazkov (2001). As 
inference in CheMA 1.0 decomposes over proteins Xi e V, and for a 
given protein, over local models G,-, the computations were parallelized 
(full details and software provided as Supplementary Material). 
Alternatively, MCMC could be used over the discrete space of reaction 
graphs (Ellis and Wong, 2008) or the joint space of graphs and param- 
eters (Oates et al., 2012). 

2.4.3 Interventions on the system In interventional experiments, 
data are obtained under treatments that externally influence network 
edges or nodes. Inhibitors of protein phosphorylation are now increas- 
ingly available; such inhibitors typically bind to the kinase domain of 
their target, preventing enzymatic activity. We consider such inhibitors 
in biological experiments below. Within CheMA 1.0, we model inhibition 
by setting to zero those terms in the design matrix Dq^s corresponding to 
the inhibited enzyme E in the treated samples ('perfect certain' interven- 
tions in the terminology of Eaton and Murphy, 2007; Spencer et al., 
2012). This removes the causal influence of E for the inhibited samples. 
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Fig. 2. Network inference, simulation study, (a) Reaction graph G for the MAPK signaling pathway because of Xu et al. (2010). (The model, based on 
enzyme kinetics, uses Michaelis-Menten equations to capture a variety of post-translational modifications including phosphorylation.) (b) AUPR[with 
respect to the true causal network N{G)\ for varying sample size n and noise level a. [Network inference methods: (i) LASSO, €i -penalized regression, (ii) 
TSNI, €2 -penalized regression, (iii) DBN, dynamic Bayesian networks, (iv) TVDBN, time-varying DBNs, (v) GP, non-parametric regression, (vi) 
CheMA 1.0, based on chemical kinetic models. Error bars display standard error computed over five independent datasets. (Full details provided in 
Supplementary Material.) 



3 RESULTS 

3.1 Hyperparameter specification and sensitivity 

For CheMA 1.0, we set hyperparameters [1^ = [1^ = 1, v=l/2 
and the maximum in-degree constraint ci=2; we investigated 
sensitivity by varying these parameters within (i) a toy model 
of signaling (Supplementary Fig. S3a-c) and (ii) in a subset of 
the simulations reported below (Supplementary Fig. S2). As the 
action of inhibition is second order in the Taylor expansion 
sense, inference for inhibitor variables Is,e may be expected to 
require substantially more data, in line with the 'weak 
identifiability' of second-order terms reported in Calderhead 
and Girolami (2011). A preliminary investigation based on a 
toy model of signaling revealed that at typical sample sizes in- 
ference for inhibitor sets 1s,e was extremely challenging 
(Supplementary Fig. S3d). Combined with computational con- 
siderations, we decided to fix C2 = 0 for subsequent experiments; 
that is, we did not include inhibitory regulation in the reaction 
graph. Further diagnostics, including MCMC convergence, are 
presented in the Supplementary Material. 

3.2 In silico MAPK pathway 

Data were generated from a mechanistic model of the MAPK 
signaling pathway described by Xu et al. (2010), specified by a 
system of 25 ODEs of Michaehs-Menten type whose reaction 
graph is shown in Figure 2a. This archetypal protein signaling 
system provides an ideal test bed, as the causal graph is known, 
and the model has been validated against experimentally ob- 
tained data (Xu et al., 2010). Following Oates and Mukherjee 
(2012), the Xu et al. model was transformed into a stochastic 
differential equation with intrinsic noise cr. Full details of the 
simulation setup appear in Supplementary Material. 

For inference of the network N(G), we compared our ap- 
proach with existing network inference methods that are com- 
patible with time course data: (i) €1 -penalized regression 
('LASSO'), (ii) time series network identification (TSNF; 



Bansal Qt al. 2006; this is based on ^2-penalized regression), (iii) 
dynamic Bayesian networks ('DBN'; Hill et al., 2012), (iv) time- 
varying DBNs (Dondelinger et al., 2012) and (v) Gaussian pro- 
cess regression with model averaging ('GP'; Aijo and 
Lahdesmaki, 2010). Approaches (i-iii) are based on linear differ- 
ence equations; (iv) relaxes the linear assumption in a piecewise 
fashion, whereas (v) is a semiparametric variable selection tech- 
nique. We note that because TSNI cannot deal with multiple 
time courses, we adapted it for use in this setting. Implementa- 
tion details for all methods may be found in the Supplementary 
Material. 

To systematically assess estimation of network structure, we 
computed the average area under the precision-recall (AUPR) 
and area under the receiver-operating characteristic (AUROC) 
curves. Figure 2b shows mean AUPR for all approaches, for 20 
regimes of sample size n and noise a. CheMA 1.0 performs con- 
sistently well in all regimes, and outperforms (i-v) substantially 
at the larger sample sizes. It is interesting to note that the linear 
and piecewise linear DBNs (iii-iv) perform better at moderate 
sample sizes compared with higher sample sizes, possibly because 
of model misspecification. AUROC results (Supplementary 
Fig. S6) showed a broadly similar pattern, with CheMA 1.0 
offering gains at larger sample sizes. For the kinetic parameters, 
however, we found that CheMA 1.0 struggled to precisely re- 
cover the true values 0 = {V,K], even when the reaction graph 
G was known (Fig. 3). The posterior distribution over rate con- 
stants V was much more informative than the posterior distribu- 
tion over Michaehs-Menten parameters K, consistent with 
the 'weak identifiability' of kinase inhibitors that we found in 
Section 3.1. 

To investigate dynamical prediction in the setting where nei- 
ther reaction graph nor parameters are known, we generated 
data from an unseen intervention and assessed abihty to predict 
the resulting dynamics (details of the simulation are included in 
the Supplementary Material). To fix a length scale, both true and 
predicted trajectories were normalized by maximum protein 
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expression in the test data. The quahty of a predicted trajectory 
was then measured by the mean squared error (MSB) relative to 
the (held out) data points. The network inference approaches (i- 
v) above cannot be directly applied for prediction in this setting 
(although they could in principle be adapted to do so). 
Therefore, we compared CheMA 1.0 with the analogous linear 
formulation, that replaces Equation (2) by fo^si^, Os) = Po + 
J^EeSs ^e^^e\ (s^^ Supplementary Material for details), along 



1-5 n 1 0-8 ri 1 0.8 




0 5 10 0 5 10 0 5 10 0 5 10 



Kq K2 K3 

Fig. 3. Posterior distributions over kinetic parameters when the graph G 
is known. As the number of samples n increases, the posterior mass 
concentrates on the true values much faster for the maximum reaction 
rates V (top row) than for the Michaelis-Menten parameters K (bottom 
row) 



with a simple, baseline estimator (the 'stationary benchmark') 
that presumes protein concentrations do not change with time. 
Figure 4a displays predictions for the dynamics that result from 
EPAC inhibition. Here CheMA 1.0 provides quahtatively correct 
prediction, whereas the linear analogue rapidly diverges to infin- 
ity (due to poorly estimated eigenvalues). Therefore, we focused 
only on short-term prediction, specifically the first 25% of the 
time course, for which linear models may yet prove useful. Over 
all simulation regimes and experiments, including at small 
sample sizes, we found that our approach produced significa- 
ntly lower MSE than both the linear and benchmark mo- 
dels (MSEcheMA 1.0 = 0.061, MSELm. = 2.55, MSEeench. = 0.199). 
Furthermore, CheMA 1.0 consistently produced lowest MSE 
at all fixed values of n and a (Supplementary Fig. SIO; 
P< 0.001 binomial test). 

3.3 In Vitro signaling 

Next, we considered experimental data obtained using reverse- 
phase protein arrays (Hennessy et al., 2010) from 15 human 
breast cancer cell lines, of which 10 were of HER2+ subtype 
(Neve et ai, 2006). These data comprised observations for key 
phosphoproteins AKT, EGFR, MEK, GSK3ab, S6, 4EBP1 and 
their unphosphoryated counterparts. Data were acquired under 
pretreatment with inhibitors Lapatinib ('EGFRi'; an EGFR/ 
HER2 inhibitor), GSK690693 ('AKTi'; an AKT inhibitor) and 
without inhibition (DMSO) at 0.5, 1, 2, 4 and 8h following 
serum stimulation, giving n = \5 observations of each species 
in each cell line (see Supplementary Material for full experimen- 
tal protocol). 



(a) 



(b) 
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Fig. 4. Predicting dynamical response to a novel intervention: (a) predicting the effect of EPAC inhibition under the data generating model of Xu et al. 
(2010). [CheMA (solid) regions correspond to standard deviation of the posterior predictive distribution. Linear (dashed) replaces the non-linear 
chemical kinetic models with simple linear models. The stationary benchmark (dotted) simply uses the initial data point as an estimate for all later 
data points. The true test data are displayed as crosses. Here n = 100, a = 0.1.] (b) Assessing prediction over a panel of 15 breast cancer cell lines. 
(Training data were time series under treatment with a single inhibitor; test data represented a second held-out inhibitor. Normalized MSE was averaged 
over all protein species and all time points.) 
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Assessment of inferred network topologies for the cell lines is 
challenging because the true cell line- specific networks are not 
known. Inferred topologies partially agree with known signaling 
(Supplementary Fig. Sll), but the latter is based mainly on stu- 
dies using wild-type cells and may not reflect networks in genet- 
ically perturbed cancer lines. Therefore, to assess performance, 
we also considered the problem of prediction of trajectories 
under an unseen intervention, where objective assessment is pos- 
sible. We sought to compare performance of CheMA 1.0 against 
a literature-based ODE model (reaction graph G fixed according 
to literature and dynamics /q as described above) fitted to train- 
ing data. No prior information concerning specific chemical 
reactions was provided to CheMA 1.0. This problem is highly 
non-trivial because of the small sample size, uneven sampling 
times and the complex observation process associated with prote- 
omic assay data. 

Training on DMSO and EGFRi (or AKTi) data, we assessed 
ability to predict the full dynamic response to AKT (or EGFR) 
inhibition. In this way, each held-out test set contained trajec- 
tories under a completely unseen intervention. By considering all 
15 cell lines, giving 30 held-out datasets, we found that in 19 of 
30 prediction problems CheMA 1 .0 outperformed the literature 
predictor (Fig. 4b). As expected (and as in the case of the simu- 
lated data), the linear model was not well behaved for prediction 
(Supplementary Fig. S12) and is not shown. In the AKTi test, of 
the 10 HER2+ cell lines, 9 were better predicted by CheMA 1.0 
compared with literature prediction (P = 0.01, binomial test; 
MSEcheMA 1.0 = 0.064 versus MSEut. = 0.274). Conversely, four 
of five HER2- lines were better predicted by literature 
(MSEut. =0.145 versus MSEcheMA i.o = 0.240), suggesting that 
signaling network topology in HER2+ lines may differ to the 
(wild type) literature topology, in line with the literature on the 
cell lines (Neve et al., 2006). This is encouraging from the per- 
spective of CheMA, as a priori it is far from clear whether the 
training data, which involved only P = 6 species and n = \0 data 
points, contain sufficient information to predict the effect of an 
unseen intervention, even approximately. However, in two of the 
failure cases (HCC 1569, HCC 1954; EGFRi test) CheMA 1.0 
produced extremely poor predictions (MSEcheMA i.o > 1), likely 
because of the small training sample size. 

4 DISCUSSION 

We proposed a general framework for using chemical kinetics in 
network inference and dynamical prediction. The use of chemical 
kinetics can be expected to contribute gains in causal inference 
because the underlying models are not structurally symmetric, 
allowing causal directionality to be estabhshed (Peters et al., 
2011). In empirical results, we found that while CheMA 1.0 
struggled to identify kinetic parameters from data, it was never- 
theless able to identify the causal network; this discrepancy is 
explained by the fact that the latter is in a sense a projection 
of the former, and can be identifiable even when the full set of 
parameters are not. 

An important challenge in systems biology is to predict the 
effect on signaling of a novel intervention, such as a drug treat- 
ment. At present, dynamical predictions in systems biology re- 
quire a known chemical reaction graph, for instance, taken from 
the literature; a system of ODEs is usually specified based on 



such a graph and used for prediction. However, in many settings, 
the chemical reaction graph may differ depending on cell type or 
disease state and cannot be assumed known. In contrast, CheMA 
shows how prediction of dynamical behavior may be possible 
even when the reaction graph itself is unknown a priori. 
Unlike more convenient linear or discrete formulations, our 
use of chemical kinetic models provides interpretable predictions. 
For example, the dynamic behavior of phosphoprotein concen- 
trations obtained under chemical kinetic rate laws is physically 
plausible (i.e. smooth, bounded and non-negative). Furthermore, 
by averaging predictions over reaction graphs, our approach 
should provide robustness in (typical) situations where it is un- 
reasonable to expect to identify G precisely. Nevertheless, pre- 
diction of trajectories based on the protein data was challenging, 
likely because of noise and small sample sizes (Supplementary 
Fig. SI 3). We anticipate that continuing technical advances will 
move high-throughput proteomics closer to the favorable simu- 
lation regimes in Section 3.2 on which we found the richer non- 
linear models to be useful. 

Several improvements can be made to the CheMA 1.0 imple- 
mentation reported here, of which we highlight two: (i) gradient 
matching (rather than numerical solution of the automatically 
generated dynamical systems) can help to relieve the computa- 
tional demands associated with exploration the large model 
spaces, but the Euler approximations we used for this purpose 
are crude. Improved gradient matching should be possible (at the 
expense of requiring more time points) via higher-order expan- 
sions, or (at the expense of additional modeling assumptions) 
kernel regression, the penalized likelihood approaches of 
Gonzalez et al. (2013); Ramsay et al. (2007), or the Bayesian 
approach of Dondelinger et al. (2013). (ii) CheMA 1.0 does 
not explicitly distinguish between process noise and observation 
noise; an interesting direction for further research would be to 
incorporate an explicit observation model. 

Two ongoing challenges in Bayesian computation relevant to 
CheMA include inference of model parameters and computation 
of marginal likelihoods for model selection. The second is an 
active area of research, with candidate approaches including 
variational approximations (Rue et al., 2009) and MCMC 
(Vyshemirsky and Girolami, 2008). In general, the computa- 
tional burden of CheMA will be higher than many methods 
(see Supplementary Material). By way of illustration, Bayesian 
inference and prediction for a system of 27 protein species re- 
quired over 12 h (serial) computational time. In contrast, linear 
or discrete models offer better scalability to high-dimensional 
settings. Thus, CheMA can complement existing methodologies 
but is not at present applicable to truly high-dimensional prob- 
lems with hundreds or thousands of nodes. 

Finally, we note the following caveats: (i) the automatic gen- 
eration of kinetic equations limits the extent to which detailed 
knowledge about particular biochemical processes and dynamics 
may be incorporated, (ii) Our empirical results suggest that more 
complex interactions, including kinase inhibition, can be ex- 
tremely difficult to identify in practice, (iii) The form of kinetics 
used here will likely be suboptimal when the assumptions of the 
Michaelis-Menten approximation are violated, (iv) Larger train- 
ing and test datasets may be needed to allow truly effective 
trajectory prediction and comprehensive assessment of 
performance. 
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